Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memfd_secret: add memfd-secret file support #2247

Open
wants to merge 192 commits into
base: criu-dev
Choose a base branch
from

Conversation

warusadura
Copy link
Member

@warusadura warusadura commented Aug 17, 2023

Fixes: #2188

prakritigoyal19 and others added 30 commits June 11, 2023 23:30
Change made through this commit:
- Include copy of flog as a seperate tree.
- Modify the makefile to add and compile flog code.

Signed-off-by: prakritigoyal19 <prakritigoyal19@gmail.com>
CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS)
 va_end was not called for argptr.

Signed-off-by: Adrian Reber <areber@redhat.com>
Separate commit for easier criu-dev <-> master transfer.

Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
It is mapped, not maped. Same applies for mmap I guess.

Found by codespell, except it wants to change it to mapped,
which will make it less specific.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Brought to you by

    codespell -w

(using codespell v2.1.0).

[v2: use "make indent" on the result]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fixes: checkpoint-restore#2121

Signed-off-by: Pengda Yang <daz-3ux@proton.me>
The TOS(type of service) field in the ip header allows you specify the
priority of the socket data.

Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
The pipe_size type is unsigned int, when the fcntl call fails and
return -1, it will cause a negative rollover problem.

Signed-off-by: zhoujie <zhoujie133@huawei.com>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than
before. Looking at older CPUs I see 2440 bytes.

    # cpuid -1 -l 0xd -s 0
    ...
        bytes required by XSAVE/XRSTOR area     = 0x00000988 (2440)

On newer CPUs (Sapphire Rapids) it grows to 11008 bytes.

    # cpuid -1 -l 0xd -s 0
    ...
        bytes required by XSAVE/XRSTOR area     = 0x00002b00 (11008)

This increase the xsave area from one page to four pages.

Without this patch the fpu03 test fails, with this patch it works again.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Using the fact that we know criu_pid and criu is a parent of restored
process we can create pidfile with pid on caller pidns level.

We need to move mount namespace creation to child so that criu-ns can
see caller pidns proc.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
By default, the file name 'amdgpu_plugin.txt' is used also as the name
for the corresponding man page (`man amdgpu_plugin`). However, when
this man page is installed system-wide it would be more appropriate
to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`).

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
crun wants to set empty_ns and this interface is missing from the
library. This adds it to libcriu.

Signed-off-by: Adrian Reber <areber@redhat.com>
--criu-binary argument provides a way to supply the CRIU binary
location to run_criu().

Related to: checkpoint-restore#1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes remove and update the changes introduced in
7177938 in favor of the
Python version in CI.

os.waitstatus_to_exitcode() function appeared in Python 3.9

Related to: checkpoint-restore#1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes add test implementations for criu-ns script.

Fixes: checkpoint-restore#1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes fix the `ImportError: No module named pathlib`
error when executing criu-ns tests located at criu/test/others/criu-ns

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
CentOS 7 CI environment uses Python 2. To execute criu-ns
script in CentOS 7 changing the current shebang line to
python is required.

This reverse the changes made in a15a63f

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
This is a patch proposed by Thomas here:
https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/

It removes (created id > desired id) "sanity" check and adds proper
checking that ids start at zero and increment by one each time when we
create/delete a posix timer.

First purpose of it is to fix infinite looping in create_posix_timers on
old pre 3.11 kernels.

Second purpose is to allow kernel interface of creating posix timers
with desired id change from iterating with predictable next id to just
setting next id directly. And at the same time removing predictable next
id so that criu with this patch would not get to infinite loop in
create_posix_timers if this happens.

Thanks a lot to Thomas!

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This hook allows to start image streamer process from an action script.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
…tions

This does cgroup namespace creation separately from joining task
cgroups. This makes the code more logical, because creating cgroup
namespace also involves joining cgroups but these cgroups can be
different to task's cgroups as they are cgroup namespace roots
(cgns_prefix), and mixing all of them together may lead to
misunderstanding.

Another positive thing is that we consolidate !item->parent checks in
one place in restore_task_with_children.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB.
Since seals are not possible in this case, fake F_GETSEALS result as if it
was queried for a non-sealing-enabled memfd.

Signed-off-by: Michał Mirosław <emmir@google.com>
Linux 4.15 doesn't like empty string for cgroup2 mount options.
Pass NULL then to satisfy the kernel check. Log the options for
easier debugging.

Signed-off-by: Michał Mirosław <emmir@google.com>
The original commit added saving THP_DISABLED flag value, but missed
restoring it. There is restoring code, but used only when --lazy_pages
mode is enabled. Restore the prctl flag always. While at it, rename the
`has_thp_enabled` -> `!thp_disabled` for consistency.

Fixes: bbbd597 (2017-06-28 "mem: add dump state of THP_DISABLED prctl")
Signed-off-by: Michał Mirosław <emmir@google.com>
If prctl(SET_THP_DISABLE) is not used due to bad semantics, log it
for easier debugging.

Signed-off-by: Michał Mirosław <emmir@google.com>
While at it, don't carry over stale errno to the fail() message.

Signed-off-by: Michał Mirosław <emmir@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
Add a sanity check for THP_DISABLE. This discovered a broken commit
in Google's kernel tree.

Signed-off-by: Michał Mirosław <emmir@google.com>
mclapinski and others added 3 commits October 11, 2023 23:56
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Michal Clapinski <mclapinski@google.com>
cgroup_ifpriomap test needs net_prio cgroup, which might not be
available. Make the .checkskip script check it.

Signed-off-by: Michał Mirosław <emmir@google.com>
@warusadura warusadura force-pushed the memfd-secret branch 2 times, most recently from e7f3ddc to 3673c4b Compare October 15, 2023 15:26
rst0git and others added 2 commits October 16, 2023 20:36
Newer versions of pip use an isolated virtual environment when building
Python projects. However, when the source code of CRIT is copied into
the isolated environment, the symlink for `../lib/py` (pycriu) becomes
invalid. As a workaround, we used the `--no-build-isolation` option for
`pip install`. However, this functionality has issues in some versions
of PIP [1, 2]. To fix this problem, this patch adds separate packages
for pycriu and crit, and each package is installed independently.

[1] pypa/pip#8221
[2] pypa/pip#8165 (comment)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Do not use $(USERCFLAGS) for anything other than what the user provide.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Copy link

A friendly reminder that this PR had no activity for 30 days.

SallyKAN and others added 16 commits November 28, 2023 09:12
In the compel/arch/arm/plugins/std/syscalls/syscall.def, the syscall number of bind on ARM64 should be 200 instead of 235

Signed-off-by: Sally Kang <snapekang@gmail.com>
The rawhide netlink errors are fixed with a newer kernel than the
default 6.2 available in Fedora 38.

Signed-off-by: Adrian Reber <areber@redhat.com>
The old test was checking if '/' is btrfs but we should check if the
current directory is btrfs.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Replace a recursive call with a loop.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Checkpoint/restore with version 25.0.0-beta.1 fails
with the following error:

$ docker start --checkpoint=c1 cr
Error response from daemon: failed to create task for container: content digest fdb1054b00a8c07f08574ce52198c5501d1f552b6a5fb46105c688c70a9acb45: not found: unknown

Release notes:
moby/moby#46816

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
WARNINGS variable should be amended, not redefined.
We still need, e.g.,  `-Wno-dangling-pointer` to build
criu on loongarch64 with gcc13.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
If ioctl(TIOCSLCKTRMIOS) fails with EPERM it means that a CRIU
process lacks of CAP_SYS_ADMIN capability. But we can use
ioctl(TIOCGLCKTRMIOS) to *read* current ->termios_locked
value from the kernel and if it's the same as we already have
we can skip failing ioctl(TIOCSLCKTRMIOS) safely.

Adrian has recently posted [1] a very good patch to allow ioctl(TIOCSLCKTRMIOS)
for processes that have CAP_CHECKPOINT_RESTORE (right now it requires CAP_SYS_ADMIN).

[1] https://lore.kernel.org/all/20231206134340.7093-1-areber@redhat.com/

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Newer versions of 'tail' rely on inotify and after a restore 'tail' is
unhappy with the state of inotify and just stops.

This replaces 'tail' with a minimal shell based test (thanks Andrei).

Signed-off-by: Adrian Reber <areber@redhat.com>
The image has a too old version of nettle which does not work with gnutls.
Just upgrade to the latest to make the error go away.

Signed-off-by: Adrian Reber <areber@redhat.com>
Adds protobuf definitions needed to checkpoint and restore
memfd_secret fd.

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
Adds support to save secretmem page stats when C/R a memfd_secret
fd containing process.

Updates images/stats.proto to add statistics about C/R
memfd_secret fd.

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
See "man 2 memfd_secret".

Fixes: checkpoint-restore#2188

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
These changes update `check_pages_counts()` to support
`secmempages_written` stat.

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
Adds a zdtm test to test checkpoint/restore a memfd_secret fd
containing process.

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
@github-actions github-actions bot removed the stale-pr label Dec 17, 2023
@avagin avagin added the no-auto-close Don't auto-close as a stale issue label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-auto-close Don't auto-close as a stale issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRIU can't dump memfd_secret fd containing process